Goto

Collaborating Authors

 real-world label noise


AlleNoise -- large-scale text classification benchmark dataset with real-world label noise

arXiv.org Artificial Intelligence

Label noise remains a challenge for training robust classification models. Most methods for mitigating label noise have been benchmarked using primarily datasets with synthetic noise. While the need for datasets with realistic noise distribution has partially been addressed by web-scraped benchmarks such as WebVision and Clothing1M, those benchmarks are restricted to the computer vision domain. With the growing importance of Transformer-based models, it is crucial to establish text classification benchmarks for learning with noisy labels. In this paper, we present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise, containing over 500,000 examples across approximately 5,600 classes, complemented with a meaningful, hierarchical taxonomy of categories. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. In addition to the noisy labels, we provide human-verified clean labels, which help to get a deeper insight into the noise distribution, unlike web-scraped datasets typically used in the field. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise. In addition, we show evidence that these algorithms do not alleviate excessive memorization. As such, with AlleNoise, we set the bar high for the development of label noise methods that can handle real-world label noise in text classification tasks. The code and dataset are available for download at https://github.com/allegro/AlleNoise.


Human labeling errors and their impact on ConvNets for satellite image scene classification

arXiv.org Artificial Intelligence

Convolutional neural networks (ConvNets) have been successfully applied to satellite image scene classification. Human-labeled training datasets are essential for ConvNets to perform accurate classification. Errors in human-labeled training datasets are unavoidable due to the complexity of satellite images. However, the distribution of human labeling errors on satellite images and their impact on ConvNets have not been investigated. To fill this research gap, this study, for the first time, collected real-world labels from 32 participants and explored how their errors affect three ConvNets (VGG16, GoogleNet and ResNet-50) for high-resolution satellite image scene classification. We found that: (1) human labeling errors have significant class and instance dependence, which is fundamentally different from the simulation noise in previous studies; (2) regarding the overall accuracy of all classes, when human labeling errors in training data increase by one unit, the overall accuracy of ConvNets classification decreases by approximately half a unit; (3) regarding the accuracy of each class, the impact of human labeling errors on ConvNets shows large heterogeneity across classes. To uncover the mechanism underlying the impact of human labeling errors on ConvNets, we further compared it with two types of simulated labeling noise: uniform noise (errors independent of both classes and instances) and class-dependent noise (errors independent of instances but not classes). Our results show that the impact of human labeling errors on ConvNets is similar to that of the simulated class-dependent noise but not to that of the simulated uniform noise, suggesting that the impact of human labeling errors on ConvNets is mainly due to class-dependent errors rather than instance-dependent errors.


NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

arXiv.org Artificial Intelligence

Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance. Due to the lack of suitable datasets, previous studies have frequently employed synthetic label noise to mimic real-world label noise. However, synthetic noise is not instance-dependent, making this approximation not always effective in practice. Recent research has proposed benchmarks for learning with real-world noisy labels. However, the noise sources within may be single or fuzzy, making benchmarks different from data with heterogeneous label noises in the real world. To tackle these issues, we contribute NoisywikiHow, the largest NLP benchmark built with minimal supervision. Specifically, inspired by human cognition, we explicitly construct multiple sources of label noise to imitate human errors throughout the annotation, replicating real-world noise, whose corruption is affected by both ground-truth labels and instances. Moreover, we provide a variety of noise levels to support controlled experiments on noisy data, enabling us to evaluate LNL methods systematically and comprehensively. After that, we conduct extensive multi-dimensional experiments on a broad range of LNL methods, obtaining new and intriguing findings.